knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warning = FALSE)
library(Seurat)
# library(plotly)
#install.packages("future")
# library(future)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(cowplot)
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
# library(gridExtra)
# library(ggrepel)
library(ggrepel)
load("../data/pre_integrated_SeurObj.RData")
setwd("/Volumes/easystore/SIMR_2019/pio-lab/scripts")
Clustering and annotation of 10X homeo isl1 sibling data
Integration with smartseq2 data, with 2 replicates, and 2 treatments: homeo and 1hr
nFeature_RNA is the number of genes detected in each cell.
nCount_RNA is the total number of molecules detected within a cell.
Low nFeature_RNA for a cell indicates that it may be dead/dying or an empty droplet.
High nCount_RNA and/or nFeature_RNA indicates that the “cell” may in fact be a doublet (or multiplet).
nFeature.vln.homeo.isl1.sib.10X + nCount.vln.homeo.isl1.sib.10X + pct.mito.vln.homeo.isl1.sib.10X
featurescatter.nCountXpercent.mt.homeo.isl1.sib.10X + featurescatter.nCountXnFeature.homeo.isl1.sib.10X
Filter cutoff: #subsetting parameters: omit cells with genes less than 3000 and greater than 200. omit mitochondrial contamination greater than 10%. omit cells with molecules greater than 10,000
Noticable increase in cells/samples from 2979 to ~16000 using an alternative subsetting parameter: homeo.isl1_sib_10X <- subset(homeo.isl1_sib_10X, subset = nFeature_RNA > 200 & nFeature_RNA < 3000 & percent.mt < 10 & nCount_RNA <10000)
homeo.isl1_sib_10X <- subset(homeo.isl1_sib_10X, subset = nFeature_RNA > 200 & nFeature_RNA < 3000 & percent.mt < 10) will give 2979 cells, which is way to conservative, marker genes will not be expressed with this subset
homeo.isl1_sib_10X
## An object of class Seurat
## 25622 features across 16607 samples within 1 assay
## Active assay: RNA (25622 features)
## 2 dimensional reductions calculated: pca, umap
Find Variable Features
VariableFeature.labedlplot2.homeo.isl1_sib_10X
Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) : The total size of the 11 globals that need to be exported for the future expression (‘FUN()’) is 3.18 GiB. This exceeds the maximum allowed size of 0.98 GiB (option ‘future.globals.maxSize’). The three largest globals are ‘object’ (3.17 GiB of class ‘numeric’), ‘features’ (1.67 MiB of class ‘character’) and ‘split.cells’ (1.39 MiB of class ‘list’).
Formerally using set future.global to 1G, change to 3.5G however activit usage on local mac is using 6G/8G. If R session aborts, will have to resort to “sequential” and fargo parallelization.
Parallelization seems to force abort R session, switch to sequential process from now on.
ElbowPlot(homeo.isl1_sib_10X, ndims = 50)
ElbowPlot(homeo.isl1_sib_10X, ndims = 30)
ElbowPlot(homeo.isl1_sib_10X, ndims = 20)
Scree plot shows that variance/eigenvalues in data seem to level off between 15-20 PCs. We’ll get the most information within the first ~15 PCs. Use jackstraw to confirm statistical power.
JackStrawPlot.PC20.homeo.isl1_sib_10X
Showing less statistical significance past >18 PC. Will set PC to 15
PC 15 with resolution 1.2, 14 clusters: gave pretty neat cluster, some single cells outside their prespective groupings however no striking outliers
umap.unlabeled
24 Clusters generated
List of marker genes was taken from Lush (2019) publications
Functions created to help annotate and visualize marker genes:
####FeaturePlotToPNG#### Given list of marker genes, will print individual FeaturePlot and directs the plots to the specified directory
figure_dir <- "isl1_sib_counts_10X_figures"
FeaturePlotToPng <- function(marker.list, dir_name) {
for (x in marker.list){
#to.png <- FeaturePlot(homeo.isl1_sib_10X, features = x, label = TRUE)
print(x)
mypath <- file.path("./", figure_dir, paste(x, ".png", sep = ""))
print(mypath)
png(file=mypath,width = 11, height = 9, units = 'in', res = 300)
print(FeaturePlot(homeo.isl1_sib_10X, features = x, label = TRUE) )
dev.off()}}
####VlnPlotToPNG Given list of marker genes, will print individual Vlnplot and directs the plots to the specified directory
VlnPlotToPng <- function(marker.list, dir_name) {
for (x in marker.list){
#to.png <- FeaturePlot(homeo.isl1_sib_10X, features = x, label = TRUE)
print(x)
mypath <- file.path("./", figure_dir, paste(x, "_vlnplot.png", sep = ""))
print(mypath)
png(file=mypath,width = 11, height = 9, units = 'in', res = 300)
print(VlnPlot(homeo.isl1_sib_10X, features = x, , pt.size = 0))
dev.off()
}
}
#####Top10GeneCellIdentityFunction##### This function will return a adjusted dataframe for the genes with the 10 (or less) highest avg_logFC and its corresponding cluster it will also generate a FeaturePlot and Violin Plot corresponding this function may be helpful as most canonical markers have low signal, this will discern what signal is highest to help identify clusters
Top10GeneCellIdentity <- function(marker.list, df){
#This function will return a adjusted dataframe for the genes with the 10 (or less) highest avg_logFC
#and its corresponding cluster
#it will also generate a FeaturePlot and Violin Plot corresponding
#this function may be helpful as most canonical markers have low signal, this will discern what signal is highest to help identify clusters
top10.df <-filter(df, Gene.name.uniq == marker.list) %>% top_n(n = 10, wt = (avg_logFC)) %>% arrange(desc(avg_logFC))
gene.list <- unique(top10.df$Gene.name.uniq)
feature.plot <- FeaturePlot(homeo.isl1_sib_10X, features = gene.list, label = TRUE)
vln.plt <- VlnPlot(homeo.isl1_sib_10X, features = gene.list, pt.size = 0)
return(list(top10.df, feature.plot, vln.plt))
dev.off()
}
More Robust Marker Gene
Use excel sheet for Lush (2019) suppl file 9 for list of hair cells in their young, mature, prog stages